Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 10105 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.4 MiB |
| Average record size in memory | 144.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 6 |
| Boolean | 4 |
pdays is highly correlated with previous | High correlation |
previous is highly correlated with pdays | High correlation |
pdays is highly correlated with previous | High correlation |
previous is highly correlated with pdays | High correlation |
pdays is highly correlated with previous | High correlation |
previous is highly correlated with pdays | High correlation |
job is highly correlated with education | High correlation |
education is highly correlated with job | High correlation |
df_index is highly correlated with duration and 1 other fields | High correlation |
age is highly correlated with job and 1 other fields | High correlation |
job is highly correlated with age and 1 other fields | High correlation |
marital is highly correlated with age | High correlation |
education is highly correlated with job | High correlation |
housing is highly correlated with month | High correlation |
contact is highly correlated with month | High correlation |
day is highly correlated with month | High correlation |
month is highly correlated with housing and 2 other fields | High correlation |
duration is highly correlated with df_index and 1 other fields | High correlation |
pdays is highly correlated with poutcome | High correlation |
poutcome is highly correlated with pdays | High correlation |
deposit is highly correlated with df_index and 1 other fields | High correlation |
df_index has unique values | Unique |
balance has 774 (7.7%) zeros | Zeros |
previous has 7568 (74.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-28 07:31:51.328511 |
|---|---|
| Analysis finished | 2022-09-28 07:32:00.382023 |
| Duration | 9.05 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 10105 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5627.074715 |
| Minimum | 0 |
|---|---|
| Maximum | 11161 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 535.2 |
| Q1 | 2852 |
| median | 5684 |
| Q3 | 8413 |
| 95-th percentile | 10603.8 |
| Maximum | 11161 |
| Range | 11161 |
| Interquartile range (IQR) | 5561 |
Descriptive statistics
| Standard deviation | 3223.261961 |
|---|---|
| Coefficient of variation (CV) | 0.5728130733 |
| Kurtosis | -1.192566309 |
| Mean | 5627.074715 |
| Median Absolute Deviation (MAD) | 2776 |
| Skewness | -0.03073460102 |
| Sum | 56861590 |
| Variance | 10389417.67 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 7508 | 1 | < 0.1% |
| 7498 | 1 | < 0.1% |
| 7499 | 1 | < 0.1% |
| 7500 | 1 | < 0.1% |
| 7501 | 1 | < 0.1% |
| 7502 | 1 | < 0.1% |
| 7504 | 1 | < 0.1% |
| 7507 | 1 | < 0.1% |
| 7509 | 1 | < 0.1% |
| Other values (10095) | 10095 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 11161 | 1 | |
| 11160 | 1 | |
| 11159 | 1 | |
| 11158 | 1 | |
| 11157 | 1 | |
| 11156 | 1 | |
| 11155 | 1 | |
| 11154 | 1 | |
| 11153 | 1 | |
| 11152 | 1 |
| Distinct | 76 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.89549728 |
| Minimum | 18 |
|---|---|
| Maximum | 95 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 26 |
| Q1 | 32 |
| median | 38 |
| Q3 | 48 |
| 95-th percentile | 61 |
| Maximum | 95 |
| Range | 77 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 11.73493055 |
|---|---|
| Coefficient of variation (CV) | 0.286949208 |
| Kurtosis | 0.6572927811 |
| Mean | 40.89549728 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.8677036244 |
| Sum | 413249 |
| Variance | 137.7085951 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 31 | 459 | 4.5% |
| 32 | 432 | 4.3% |
| 34 | 432 | 4.3% |
| 33 | 431 | 4.3% |
| 35 | 429 | 4.2% |
| 30 | 415 | 4.1% |
| 36 | 393 | 3.9% |
| 37 | 334 | 3.3% |
| 38 | 319 | 3.2% |
| 39 | 317 | 3.1% |
| Other values (66) | 6144 |
| Value | Count | Frequency (%) |
| 18 | 8 | 0.1% |
| 19 | 13 | 0.1% |
| 20 | 19 | 0.2% |
| 21 | 29 | 0.3% |
| 22 | 48 | 0.5% |
| 23 | 64 | 0.6% |
| 24 | 89 | |
| 25 | 162 | |
| 26 | 220 | |
| 27 | 221 |
| Value | Count | Frequency (%) |
| 95 | 1 | < 0.1% |
| 93 | 2 | |
| 92 | 2 | |
| 90 | 2 | |
| 89 | 1 | < 0.1% |
| 88 | 2 | |
| 87 | 3 | |
| 86 | 4 | |
| 85 | 3 | |
| 84 | 2 |
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| management | |
|---|---|
| blue-collar | |
| technician | |
| admin. | |
| services | |
| Other values (6) |
Length
| Max length | 13 |
|---|---|
| Median length | 12 |
| Mean length | 9.36091044 |
| Min length | 6 |
Characters and Unicode
| Total characters | 94592 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | admin. |
|---|---|
| 2nd row | admin. |
| 3rd row | technician |
| 4th row | services |
| 5th row | admin. |
Common Values
| Value | Count | Frequency (%) |
| management | 2315 | |
| blue-collar | 1807 | |
| technician | 1638 | |
| admin. | 1246 | |
| services | 868 | 8.6% |
| retired | 663 | 6.6% |
| self-employed | 358 | 3.5% |
| unemployed | 332 | 3.3% |
| student | 326 | 3.2% |
| entrepreneur | 300 | 3.0% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| management | 2315 | |
| blue-collar | 1807 | |
| technician | 1638 | |
| admin | 1246 | |
| services | 868 | 8.6% |
| retired | 663 | 6.6% |
| self-employed | 358 | 3.5% |
| unemployed | 332 | 3.3% |
| student | 326 | 3.2% |
| entrepreneur | 300 | 3.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 14653 | |
| n | 10410 | |
| a | 9573 | |
| m | 6818 | 7.2% |
| l | 6469 | 6.8% |
| i | 6305 | 6.7% |
| c | 5951 | 6.3% |
| t | 5568 | 5.9% |
| r | 4901 | 5.2% |
| d | 3177 | 3.4% |
| Other values (12) | 20767 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 91181 | |
| Dash Punctuation | 2165 | 2.3% |
| Other Punctuation | 1246 | 1.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 14653 | |
| n | 10410 | |
| a | 9573 | |
| m | 6818 | 7.5% |
| l | 6469 | 7.1% |
| i | 6305 | 6.9% |
| c | 5951 | 6.5% |
| t | 5568 | 6.1% |
| r | 4901 | 5.4% |
| d | 3177 | 3.5% |
| Other values (10) | 17356 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2165 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1246 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 91181 | |
| Common | 3411 | 3.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 14653 | |
| n | 10410 | |
| a | 9573 | |
| m | 6818 | 7.5% |
| l | 6469 | 7.1% |
| i | 6305 | 6.9% |
| c | 5951 | 6.5% |
| t | 5568 | 6.1% |
| r | 4901 | 5.4% |
| d | 3177 | 3.5% |
| Other values (10) | 17356 |
Common
| Value | Count | Frequency (%) |
| - | 2165 | |
| . | 1246 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 94592 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 14653 | |
| n | 10410 | |
| a | 9573 | |
| m | 6818 | 7.2% |
| l | 6469 | 6.8% |
| i | 6305 | 6.7% |
| c | 5951 | 6.3% |
| t | 5568 | 5.9% |
| r | 4901 | 5.2% |
| d | 3177 | 3.4% |
| Other values (12) | 20767 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| married | |
|---|---|
| single | |
| divorced |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 6.798515586 |
| Min length | 6 |
Characters and Unicode
| Total characters | 68699 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | married |
|---|---|
| 2nd row | married |
| 3rd row | married |
| 4th row | married |
| 5th row | married |
Common Values
| Value | Count | Frequency (%) |
| married | 5715 | |
| single | 3213 | |
| divorced | 1177 | 11.6% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| married | 5715 | |
| single | 3213 | |
| divorced | 1177 | 11.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 12607 | |
| i | 10105 | |
| e | 10105 | |
| d | 8069 | |
| m | 5715 | |
| a | 5715 | |
| s | 3213 | 4.7% |
| n | 3213 | 4.7% |
| g | 3213 | 4.7% |
| l | 3213 | 4.7% |
| Other values (3) | 3531 | 5.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 68699 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 12607 | |
| i | 10105 | |
| e | 10105 | |
| d | 8069 | |
| m | 5715 | |
| a | 5715 | |
| s | 3213 | 4.7% |
| n | 3213 | 4.7% |
| g | 3213 | 4.7% |
| l | 3213 | 4.7% |
| Other values (3) | 3531 | 5.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 68699 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 12607 | |
| i | 10105 | |
| e | 10105 | |
| d | 8069 | |
| m | 5715 | |
| a | 5715 | |
| s | 3213 | 4.7% |
| n | 3213 | 4.7% |
| g | 3213 | 4.7% |
| l | 3213 | 4.7% |
| Other values (3) | 3531 | 5.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 68699 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 12607 | |
| i | 10105 | |
| e | 10105 | |
| d | 8069 | |
| m | 5715 | |
| a | 5715 | |
| s | 3213 | 4.7% |
| n | 3213 | 4.7% |
| g | 3213 | 4.7% |
| l | 3213 | 4.7% |
| Other values (3) | 3531 | 5.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| secondary | |
|---|---|
| tertiary | |
| primary |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.412469075 |
| Min length | 7 |
Characters and Unicode
| Total characters | 85008 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | secondary |
|---|---|
| 2nd row | secondary |
| 3rd row | secondary |
| 4th row | secondary |
| 5th row | tertiary |
Common Values
| Value | Count | Frequency (%) |
| secondary | 5517 | |
| tertiary | 3239 | |
| primary | 1349 | 13.3% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| secondary | 5517 | |
| tertiary | 3239 | |
| primary | 1349 | 13.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 14693 | |
| a | 10105 | |
| y | 10105 | |
| e | 8756 | |
| t | 6478 | |
| s | 5517 | 6.5% |
| c | 5517 | 6.5% |
| o | 5517 | 6.5% |
| n | 5517 | 6.5% |
| d | 5517 | 6.5% |
| Other values (3) | 7286 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 85008 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 14693 | |
| a | 10105 | |
| y | 10105 | |
| e | 8756 | |
| t | 6478 | |
| s | 5517 | 6.5% |
| c | 5517 | 6.5% |
| o | 5517 | 6.5% |
| n | 5517 | 6.5% |
| d | 5517 | 6.5% |
| Other values (3) | 7286 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 85008 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| r | 14693 | |
| a | 10105 | |
| y | 10105 | |
| e | 8756 | |
| t | 6478 | |
| s | 5517 | 6.5% |
| c | 5517 | 6.5% |
| o | 5517 | 6.5% |
| n | 5517 | 6.5% |
| d | 5517 | 6.5% |
| Other values (3) | 7286 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 85008 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| r | 14693 | |
| a | 10105 | |
| y | 10105 | |
| e | 8756 | |
| t | 6478 | |
| s | 5517 | 6.5% |
| c | 5517 | 6.5% |
| o | 5517 | 6.5% |
| n | 5517 | 6.5% |
| d | 5517 | 6.5% |
| Other values (3) | 7286 |
default
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.0 KiB |
| False | |
|---|---|
| True | 166 |
| Value | Count | Frequency (%) |
| False | 9939 | |
| True | 166 | 1.6% |
| Distinct | 2963 |
|---|---|
| Distinct (%) | 29.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 807.6535379 |
| Minimum | -2049 |
|---|---|
| Maximum | 4063 |
| Zeros | 774 |
| Zeros (%) | 7.7% |
| Negative | 682 |
| Negative (%) | 6.7% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | -2049 |
|---|---|
| 5-th percentile | -87 |
| Q1 | 95 |
| median | 445 |
| Q3 | 1227 |
| 95-th percentile | 3040.4 |
| Maximum | 4063 |
| Range | 6112 |
| Interquartile range (IQR) | 1132 |
Descriptive statistics
| Standard deviation | 994.1519657 |
|---|---|
| Coefficient of variation (CV) | 1.230913899 |
| Kurtosis | 1.112528777 |
| Mean | 807.6535379 |
| Median Absolute Deviation (MAD) | 433 |
| Skewness | 1.309601354 |
| Sum | 8161339 |
| Variance | 988338.1308 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 774 | 7.7% |
| 1 | 39 | 0.4% |
| 2 | 34 | 0.3% |
| 3 | 34 | 0.3% |
| 550 | 31 | 0.3% |
| 4 | 29 | 0.3% |
| 5 | 27 | 0.3% |
| 19 | 20 | 0.2% |
| 8 | 19 | 0.2% |
| 62 | 18 | 0.2% |
| Other values (2953) | 9080 |
| Value | Count | Frequency (%) |
| -2049 | 1 | |
| -1965 | 1 | |
| -1944 | 1 | |
| -1701 | 1 | |
| -1636 | 1 | |
| -1531 | 1 | |
| -1489 | 1 | |
| -1451 | 1 | |
| -1415 | 2 | |
| -1386 | 1 |
| Value | Count | Frequency (%) |
| 4063 | 1 | |
| 4062 | 1 | |
| 4060 | 1 | |
| 4056 | 1 | |
| 4054 | 1 | |
| 4053 | 1 | |
| 4048 | 1 | |
| 4047 | 1 | |
| 4041 | 2 | |
| 4040 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.0 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 5243 | |
| True | 4862 |
loan
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 10.0 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 8712 | |
| True | 1393 | 13.8% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| cellular | |
|---|---|
| unknown | |
| telephone | 661 |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.851558634 |
| Min length | 7 |
Characters and Unicode
| Total characters | 79340 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | unknown |
|---|---|
| 2nd row | unknown |
| 3rd row | unknown |
| 4th row | unknown |
| 5th row | unknown |
Common Values
| Value | Count | Frequency (%) |
| cellular | 7283 | |
| unknown | 2161 | 21.4% |
| telephone | 661 | 6.5% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| cellular | 7283 | |
| unknown | 2161 | 21.4% |
| telephone | 661 | 6.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 22510 | |
| u | 9444 | |
| e | 9266 | |
| c | 7283 | 9.2% |
| a | 7283 | 9.2% |
| r | 7283 | 9.2% |
| n | 7144 | 9.0% |
| o | 2822 | 3.6% |
| k | 2161 | 2.7% |
| w | 2161 | 2.7% |
| Other values (3) | 1983 | 2.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 79340 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| l | 22510 | |
| u | 9444 | |
| e | 9266 | |
| c | 7283 | 9.2% |
| a | 7283 | 9.2% |
| r | 7283 | 9.2% |
| n | 7144 | 9.0% |
| o | 2822 | 3.6% |
| k | 2161 | 2.7% |
| w | 2161 | 2.7% |
| Other values (3) | 1983 | 2.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 79340 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| l | 22510 | |
| u | 9444 | |
| e | 9266 | |
| c | 7283 | 9.2% |
| a | 7283 | 9.2% |
| r | 7283 | 9.2% |
| n | 7144 | 9.0% |
| o | 2822 | 3.6% |
| k | 2161 | 2.7% |
| w | 2161 | 2.7% |
| Other values (3) | 1983 | 2.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 79340 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| l | 22510 | |
| u | 9444 | |
| e | 9266 | |
| c | 7283 | 9.2% |
| a | 7283 | 9.2% |
| r | 7283 | 9.2% |
| n | 7144 | 9.0% |
| o | 2822 | 3.6% |
| k | 2161 | 2.7% |
| w | 2161 | 2.7% |
| Other values (3) | 1983 | 2.5% |
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.59030183 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 8 |
| median | 15 |
| Q3 | 22 |
| 95-th percentile | 30 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 8.441509961 |
|---|---|
| Coefficient of variation (CV) | 0.5414590463 |
| Kurtosis | -1.068760047 |
| Mean | 15.59030183 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.1323348611 |
| Sum | 157540 |
| Variance | 71.25909042 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) |
| 18 | 493 | 4.9% |
| 20 | 492 | 4.9% |
| 5 | 449 | 4.4% |
| 30 | 430 | 4.3% |
| 15 | 426 | 4.2% |
| 13 | 419 | 4.1% |
| 6 | 413 | 4.1% |
| 14 | 408 | 4.0% |
| 12 | 403 | 4.0% |
| 8 | 394 | 3.9% |
| Other values (21) | 5778 |
| Value | Count | Frequency (%) |
| 1 | 109 | 1.1% |
| 2 | 290 | |
| 3 | 286 | |
| 4 | 364 | |
| 5 | 449 | |
| 6 | 413 | |
| 7 | 359 | |
| 8 | 394 | |
| 9 | 334 | |
| 10 | 148 | 1.5% |
| Value | Count | Frequency (%) |
| 31 | 126 | 1.2% |
| 30 | 430 | |
| 29 | 358 | |
| 28 | 377 | |
| 27 | 262 | |
| 26 | 233 | |
| 25 | 200 | |
| 24 | 112 | 1.1% |
| 23 | 210 | |
| 22 | 236 |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| may | |
|---|---|
| jul | |
| aug | |
| jun | |
| apr | |
| Other values (7) |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 30315 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | may |
|---|---|
| 2nd row | may |
| 3rd row | may |
| 4th row | may |
| 5th row | may |
Common Values
| Value | Count | Frequency (%) |
| may | 2617 | |
| jul | 1418 | |
| aug | 1385 | |
| jun | 1104 | |
| apr | 830 | 8.2% |
| nov | 780 | 7.7% |
| feb | 709 | 7.0% |
| oct | 335 | 3.3% |
| jan | 319 | 3.2% |
| sep | 278 | 2.8% |
| Other values (2) | 330 | 3.3% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| may | 2617 | |
| jul | 1418 | |
| aug | 1385 | |
| jun | 1104 | |
| apr | 830 | 8.2% |
| nov | 780 | 7.7% |
| feb | 709 | 7.0% |
| oct | 335 | 3.3% |
| jan | 319 | 3.2% |
| sep | 278 | 2.8% |
| Other values (2) | 330 | 3.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 5388 | |
| u | 3907 | |
| m | 2854 | |
| j | 2841 | |
| y | 2617 | |
| n | 2203 | |
| l | 1418 | 4.7% |
| g | 1385 | 4.6% |
| o | 1115 | 3.7% |
| p | 1108 | 3.7% |
| Other values (9) | 5479 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 30315 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 5388 | |
| u | 3907 | |
| m | 2854 | |
| j | 2841 | |
| y | 2617 | |
| n | 2203 | |
| l | 1418 | 4.7% |
| g | 1385 | 4.6% |
| o | 1115 | 3.7% |
| p | 1108 | 3.7% |
| Other values (9) | 5479 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 30315 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 5388 | |
| u | 3907 | |
| m | 2854 | |
| j | 2841 | |
| y | 2617 | |
| n | 2203 | |
| l | 1418 | 4.7% |
| g | 1385 | 4.6% |
| o | 1115 | 3.7% |
| p | 1108 | 3.7% |
| Other values (9) | 5479 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 30315 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 5388 | |
| u | 3907 | |
| m | 2854 | |
| j | 2841 | |
| y | 2617 | |
| n | 2203 | |
| l | 1418 | 4.7% |
| g | 1385 | 4.6% |
| o | 1115 | 3.7% |
| p | 1108 | 3.7% |
| Other values (9) | 5479 |
| Distinct | 1390 |
|---|---|
| Distinct (%) | 13.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 368.7426027 |
| Minimum | 2 |
|---|---|
| Maximum | 3881 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 137 |
| median | 252 |
| Q3 | 490 |
| 95-th percentile | 1075.8 |
| Maximum | 3881 |
| Range | 3879 |
| Interquartile range (IQR) | 353 |
Descriptive statistics
| Standard deviation | 346.6515237 |
|---|---|
| Coefficient of variation (CV) | 0.9400907873 |
| Kurtosis | 7.797566486 |
| Mean | 368.7426027 |
| Median Absolute Deviation (MAD) | 142 |
| Skewness | 2.19978852 |
| Sum | 3726144 |
| Variance | 120167.2789 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 161 | 36 | 0.4% |
| 97 | 35 | 0.3% |
| 136 | 34 | 0.3% |
| 144 | 33 | 0.3% |
| 114 | 33 | 0.3% |
| 150 | 33 | 0.3% |
| 87 | 32 | 0.3% |
| 90 | 32 | 0.3% |
| 158 | 32 | 0.3% |
| 152 | 32 | 0.3% |
| Other values (1380) | 9773 |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 6 | 0.1% |
| 7 | 15 | |
| 8 | 15 | |
| 9 | 10 | |
| 10 | 15 | |
| 11 | 10 |
| Value | Count | Frequency (%) |
| 3881 | 1 | |
| 3284 | 1 | |
| 3253 | 1 | |
| 3183 | 1 | |
| 3102 | 1 | |
| 3094 | 1 | |
| 3076 | 1 | |
| 2775 | 1 | |
| 2770 | 1 | |
| 2769 | 1 |
campaign
Real number (ℝ≥0)
| Distinct | 35 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.517169718 |
| Minimum | 1 |
|---|---|
| Maximum | 43 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 7 |
| Maximum | 43 |
| Range | 42 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 2.707158757 |
|---|---|
| Coefficient of variation (CV) | 1.075477246 |
| Kurtosis | 39.56382556 |
| Mean | 2.517169718 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 4.937566318 |
| Sum | 25436 |
| Variance | 7.328708537 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=35)
| Value | Count | Frequency (%) |
| 1 | 4331 | |
| 2 | 2749 | |
| 3 | 1192 | 11.8% |
| 4 | 696 | 6.9% |
| 5 | 347 | 3.4% |
| 6 | 239 | 2.4% |
| 7 | 126 | 1.2% |
| 8 | 117 | 1.2% |
| 9 | 64 | 0.6% |
| 10 | 47 | 0.5% |
| Other values (25) | 197 | 1.9% |
| Value | Count | Frequency (%) |
| 1 | 4331 | |
| 2 | 2749 | |
| 3 | 1192 | 11.8% |
| 4 | 696 | 6.9% |
| 5 | 347 | 3.4% |
| 6 | 239 | 2.4% |
| 7 | 126 | 1.2% |
| 8 | 117 | 1.2% |
| 9 | 64 | 0.6% |
| 10 | 47 | 0.5% |
| Value | Count | Frequency (%) |
| 43 | 2 | |
| 41 | 1 | < 0.1% |
| 33 | 1 | < 0.1% |
| 32 | 2 | |
| 31 | 1 | < 0.1% |
| 30 | 4 | |
| 29 | 2 | |
| 28 | 1 | < 0.1% |
| 27 | 1 | < 0.1% |
| 26 | 3 |
| Distinct | 458 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51.31964374 |
| Minimum | -1 |
|---|---|
| Maximum | 854 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 7568 |
| Negative (%) | 74.9% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | -1 |
| median | -1 |
| Q3 | 2 |
| 95-th percentile | 329 |
| Maximum | 854 |
| Range | 855 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 109.6441789 |
|---|---|
| Coefficient of variation (CV) | 2.136495324 |
| Kurtosis | 6.885033932 |
| Mean | 51.31964374 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.463076113 |
| Sum | 518585 |
| Variance | 12021.84596 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -1 | 7568 | |
| 92 | 88 | 0.9% |
| 182 | 77 | 0.8% |
| 181 | 75 | 0.7% |
| 91 | 74 | 0.7% |
| 183 | 68 | 0.7% |
| 184 | 40 | 0.4% |
| 94 | 39 | 0.4% |
| 93 | 38 | 0.4% |
| 95 | 33 | 0.3% |
| Other values (448) | 2005 | 19.8% |
| Value | Count | Frequency (%) |
| -1 | 7568 | |
| 1 | 8 | 0.1% |
| 2 | 8 | 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 3 | < 0.1% |
| 8 | 2 | < 0.1% |
| 9 | 7 | 0.1% |
| 10 | 3 | < 0.1% |
| 12 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 854 | 1 | |
| 842 | 1 | |
| 828 | 1 | |
| 826 | 1 | |
| 805 | 1 | |
| 804 | 1 | |
| 792 | 1 | |
| 784 | 1 | |
| 782 | 1 | |
| 778 | 1 |
| Distinct | 30 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.8162295893 |
| Minimum | 0 |
|---|---|
| Maximum | 58 |
| Zeros | 7568 |
| Zeros (%) | 74.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 79.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 58 |
| Range | 58 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 2.243794521 |
|---|---|
| Coefficient of variation (CV) | 2.748974737 |
| Kurtosis | 113.4393749 |
| Mean | 0.8162295893 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.440263081 |
| Sum | 8248 |
| Variance | 5.034613851 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=30)
| Value | Count | Frequency (%) |
| 0 | 7568 | |
| 1 | 796 | 7.9% |
| 2 | 612 | 6.1% |
| 3 | 391 | 3.9% |
| 4 | 223 | 2.2% |
| 5 | 147 | 1.5% |
| 6 | 107 | 1.1% |
| 7 | 66 | 0.7% |
| 8 | 58 | 0.6% |
| 9 | 28 | 0.3% |
| Other values (20) | 109 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 7568 | |
| 1 | 796 | 7.9% |
| 2 | 612 | 6.1% |
| 3 | 391 | 3.9% |
| 4 | 223 | 2.2% |
| 5 | 147 | 1.5% |
| 6 | 107 | 1.1% |
| 7 | 66 | 0.7% |
| 8 | 58 | 0.6% |
| 9 | 28 | 0.3% |
| Value | Count | Frequency (%) |
| 58 | 1 | |
| 55 | 1 | |
| 41 | 1 | |
| 37 | 1 | |
| 30 | 1 | |
| 29 | 1 | |
| 27 | 2 | |
| 23 | 2 | |
| 22 | 1 | |
| 20 | 2 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 79.1 KiB |
| unknown | |
|---|---|
| failure | |
| success | |
| other | 481 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.904799604 |
| Min length | 5 |
Characters and Unicode
| Total characters | 69773 |
|---|---|
| Distinct characters | 15 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | unknown |
|---|---|
| 2nd row | unknown |
| 3rd row | unknown |
| 4th row | unknown |
| 5th row | unknown |
Common Values
| Value | Count | Frequency (%) |
| unknown | 7570 | |
| failure | 1109 | 11.0% |
| success | 945 | 9.4% |
| other | 481 | 4.8% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| unknown | 7570 | |
| failure | 1109 | 11.0% |
| success | 945 | 9.4% |
| other | 481 | 4.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 22710 | |
| u | 9624 | |
| o | 8051 | 11.5% |
| k | 7570 | 10.8% |
| w | 7570 | 10.8% |
| s | 2835 | 4.1% |
| e | 2535 | 3.6% |
| c | 1890 | 2.7% |
| r | 1590 | 2.3% |
| f | 1109 | 1.6% |
| Other values (5) | 4289 | 6.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 69773 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 22710 | |
| u | 9624 | |
| o | 8051 | 11.5% |
| k | 7570 | 10.8% |
| w | 7570 | 10.8% |
| s | 2835 | 4.1% |
| e | 2535 | 3.6% |
| c | 1890 | 2.7% |
| r | 1590 | 2.3% |
| f | 1109 | 1.6% |
| Other values (5) | 4289 | 6.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 69773 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 22710 | |
| u | 9624 | |
| o | 8051 | 11.5% |
| k | 7570 | 10.8% |
| w | 7570 | 10.8% |
| s | 2835 | 4.1% |
| e | 2535 | 3.6% |
| c | 1890 | 2.7% |
| r | 1590 | 2.3% |
| f | 1109 | 1.6% |
| Other values (5) | 4289 | 6.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 69773 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 22710 | |
| u | 9624 | |
| o | 8051 | 11.5% |
| k | 7570 | 10.8% |
| w | 7570 | 10.8% |
| s | 2835 | 4.1% |
| e | 2535 | 3.6% |
| c | 1890 | 2.7% |
| r | 1590 | 2.3% |
| f | 1109 | 1.6% |
| Other values (5) | 4289 | 6.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | deposit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 59 | admin. | married | secondary | no | 2343.0 | yes | no | unknown | 5 | may | 1042 | 1 | -1 | 0 | unknown | yes |
| 1 | 1 | 56 | admin. | married | secondary | no | 45.0 | no | no | unknown | 5 | may | 1467 | 1 | -1 | 0 | unknown | yes |
| 2 | 2 | 41 | technician | married | secondary | no | 1270.0 | yes | no | unknown | 5 | may | 1389 | 1 | -1 | 0 | unknown | yes |
| 3 | 3 | 55 | services | married | secondary | no | 2476.0 | yes | no | unknown | 5 | may | 579 | 1 | -1 | 0 | unknown | yes |
| 4 | 4 | 54 | admin. | married | tertiary | no | 184.0 | no | no | unknown | 5 | may | 673 | 2 | -1 | 0 | unknown | yes |
| 5 | 5 | 42 | management | single | tertiary | no | 0.0 | yes | yes | unknown | 5 | may | 562 | 2 | -1 | 0 | unknown | yes |
| 6 | 6 | 56 | management | married | tertiary | no | 830.0 | yes | yes | unknown | 6 | may | 1201 | 1 | -1 | 0 | unknown | yes |
| 7 | 7 | 60 | retired | divorced | secondary | no | 545.0 | yes | no | unknown | 6 | may | 1030 | 1 | -1 | 0 | unknown | yes |
| 8 | 8 | 37 | technician | married | secondary | no | 1.0 | yes | no | unknown | 6 | may | 608 | 1 | -1 | 0 | unknown | yes |
| 9 | 9 | 28 | services | single | secondary | no | 550.0 | yes | no | unknown | 6 | may | 1297 | 3 | -1 | 0 | unknown | yes |
Last rows
| df_index | age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | deposit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10095 | 11152 | 34 | housemaid | married | secondary | no | 390.0 | yes | no | cellular | 15 | jul | 659 | 3 | -1 | 0 | unknown | no |
| 10096 | 11153 | 43 | admin. | single | secondary | no | 35.0 | no | no | telephone | 9 | nov | 208 | 1 | -1 | 0 | unknown | no |
| 10097 | 11154 | 52 | technician | married | tertiary | no | 523.0 | yes | yes | cellular | 8 | jul | 113 | 1 | -1 | 0 | unknown | no |
| 10098 | 11155 | 35 | blue-collar | married | secondary | no | 80.0 | yes | yes | cellular | 21 | nov | 38 | 2 | 172 | 2 | failure | no |
| 10099 | 11156 | 34 | blue-collar | single | secondary | no | -72.0 | yes | no | cellular | 7 | jul | 273 | 5 | -1 | 0 | unknown | no |
| 10100 | 11157 | 33 | blue-collar | single | primary | no | 1.0 | yes | no | cellular | 20 | apr | 257 | 1 | -1 | 0 | unknown | no |
| 10101 | 11158 | 39 | services | married | secondary | no | 733.0 | no | no | unknown | 16 | jun | 83 | 4 | -1 | 0 | unknown | no |
| 10102 | 11159 | 32 | technician | single | secondary | no | 29.0 | no | no | cellular | 19 | aug | 156 | 2 | -1 | 0 | unknown | no |
| 10103 | 11160 | 43 | technician | married | secondary | no | 0.0 | no | yes | cellular | 8 | may | 9 | 2 | 172 | 5 | failure | no |
| 10104 | 11161 | 34 | technician | married | secondary | no | 0.0 | no | no | cellular | 9 | jul | 628 | 1 | -1 | 0 | unknown | no |